Antipattern: Assume You Must Use Files
Let's learn the drawbacks of storing images for web applications.
Conceptually, an image is an attribute in a table. For example, the Accounts
table may have a portrait_image
column.
Likewise, we can store multiple images of the same type in a dependent table. For example, a bug may have multiple screenshots that illustrate it.
That much is straightforward, but choosing the data type for an image is a subject of controversy. Raw binary data for an image can be stored in a BLOB
data type, as shown previously. However, many people instead store the image as a file on the filesystem and store the path to this file as a VARCHAR
.
Software developers argue passionately about this issue. There are good reasons for both solutions, but it’s common for many programmers to be unequivocally convinced that we should always store files external to the database. However, there are several real risks that are associated with this design, and these are described in the following sections.
Files don’t obey DELETE
#
The first problem is that of garbage collection. If our images are outside the database and we delete the row that contains the path, there is no way for the file named by that path to be removed automatically.
Unless we design our application to remove these “orphaned” image files as we delete the database row that references them, they will accumulate.
Files don’t obey transaction isolation#
Usually when we update or delete data, these changes aren’t visible to other clients until we finish our transaction with COMMIT
.
However, any changes we make to files outside the database don’t work this way. If we remove a file, it is immediately inaccessible to other clients. And if we change the contents of the file, other clients see those changes immediately instead of seeing the previous contents of the file while our transaction is still uncommitted.
<?php
$stmt = $pdo->query("DELETE FROM Screenshots
WHERE bug_id = 1234 AND image_id =1");
unlink('images/screenshot1234-1.jpg');
// Other clients still see the row in the database,
// but not the image file.
$pdo->commit();
?>
In practice, these kinds of anomalies may be infrequent. Also, the impact is minor in this example; a missing image is hardly rare in a web application. But in other scenarios, the consequences could be worse.
Files don’t obey ROLLBACK
#
It’s normal to roll back transactions in case of errors, or even if the logic of our application requires us to cancel changes. However, rolling back transactions does not roll back changes made on a file that was changed as a result of the transaction as well.
For example, suppose that we remove a screenshot file as we execute a DELETE
statement to remove the corresponding row in the database. If we roll back this change, the deletion of the row in the database is reversed, but the file is still gone.
<?php
$stmt = $pdo->query("DELETE FROM Screenshots
WHERE bug_id = 1234 AND image_id =1");
unlink("images/screenshot1234-1.jpg");
$pdo->rollback();
?>
The row in the database is restored but not the image file.
Files don’t obey database backup tools#
Most database brands provide a client tool to assist in backing up a database that is in use. For example, MySQL provides mysqldump
, Oracle provides rman
, PostgreSQL provides pg_dump
, SQLite provides the .dump
command, and so on. Using a backup tool is important because if other clients are making changes concurrently, our backup could end up containing partial changes, at times even breaking referential integrity or making the backup corrupt and useless for recovery.
A backup tool doesn’t know how to include files referenced by the pathname in a VARCHAR
column of a table. So when we back up a database, we need to remember a two-step process: use the database backup tool, and then use a filesystem backup tool for the collection of external image files.
Even if we include the external files with the backup, it’s hard to ensure that copies of these files are in sync with the transaction we used to back up the database. Applications may add or change image files at any time, sometimes even only a moment after we begin our database backup.
Files don’t obey SQL access privileges#
External files circumvent any privileges that are assigned with the GRANT
and REVOKE
SQL statements. SQL privileges manage access to tables and columns, but they don’t apply to external files named by strings in the database.
Files are not SQL data types#
The path stored in screenshot_path
is merely a string. The database doesn’t verify if the string is a valid pathname, nor can the database verify if the file exists at the path we have entered. If the file is renamed, moved, or deleted, the database won’t update the string in the database automatically. Any logic that treats this string as a pathname depends on the code that we write in our application.
<?php
define('DATA_DIRECTORY', '/var/bugtracker/data/');
$stmt = $pdo->query("SELECT image_path FROM Screenshots
WHERE bug_id = 1234 AND image_id = 1");
$row = $stmt->fetch();
$image_path = $row[0];
// Read the actual image -- I hope the path is correct!
$image = file_get_contents(DATA_DIRECTORY . $image_path);
?>
One advantage of using a database is that it helps us preserve data integrity. When we put some of our data in external files, we circumvent this advantage, and we have to write application code to perform checks that the database should handle.